Keiichiro TAKADA Yasuaki TOKUMO Tomohiro IKAI Takeshi CHUJOH
Video-based point cloud compression (V-PCC) utilizes video compression technology to efficiently encode dense point clouds providing state-of-the-art compression performance with a relatively small computation burden. V-PCC converts 3-dimensional point cloud data into three types of 2-dimensional frames, i.e., occupancy, geometry, and attribute frames, and encodes them via video compression. On the other hand, the quality of these frames may be degraded due to video compression. This paper proposes an adaptive neural network-based post-processing filter on attribute frames to alleviate the degradation problem. Furthermore, a novel training method using occupancy frames is studied. The experimental results show average BD-rate gains of 3.0%, 29.3% and 22.2% for Y, U and V respectively.
Ryozo TAKAHASHI Takuji MIKI Makoto NAGATA
This brief presents a side-channel attack (SCA) technique on a high-speed asynchronous successive approximation register (SAR) analog-to-digital converter (ADC). The proposed dual neural network based on multiple noise waveforms separately discloses sign and absolute value information of input signals which are hidden by the differential structure and high-speed asynchronous operation. The target SAR ADC and on-chip noise monitors are designed on a single prototype chip for SCA demonstration. Fabricated in 40 nm, the experimental results show the proposed attack on the asynchronous SAR ADC successfully restores the input data with a competitive accuracy within 300 mV rms error.
Takehiro TAKAYANAGI Kiyoshi IZUMI
Personalized stock recommendations aim to suggest stocks tailored to individual investor needs, significantly aiding the financial decision making of an investor. This study shows the advantages of incorporating context into personalized stock recommendation systems. We embed item contextual information such as technical indicators, fundamental factors, and business activities of individual stocks. Simultaneously, we consider user contextual information such as investors' personality traits, behavioral characteristics, and attributes to create a comprehensive investor profile. Our model incorporating contextual information, validated on novel stock recommendation tasks, demonstrated a notable improvement over baseline models when incorporating these contextual features. Consistent outperformance across various hyperparameters further underscores the robustness and utility of our model in integrating stocks' features and investors' traits into personalized stock recommendations.
A fully analog pipelined deep neural network (DNN) accelerator is proposed, which is constructed by using pipeline registers based on master-slave switched capacitors. The idea of the master-slave switched capacitors is an analog equivalent of the delayed flip-flop (D-FF) which has been used as a digital pipeline register. To estimate the performance of the pipeline register, it is applied to a conventional DNN which performs non-pipeline operation. Compared with the conventional DNN, the cycle time is reduced by 61.5% and data rate is increased by 160%. The accuracy reaches 99.6% in MNIST classification test. The energy consumption per classification is reduced by 88.2% to 0.128µJ, achieving an energy efficiency of 1.05TOPS/W and a throughput of 0.538TOPS in 180nm technology node.
Thin Tharaphe THEIN Yoshiaki SHIRAISHI Masakatu MORII
With a rapidly escalating number of sophisticated cyber-attacks, protecting Internet of Things (IoT) networks against unauthorized activity is a major concern. The detection of malicious attack traffic is thus crucial for IoT security to prevent unwanted traffic. However, existing traditional malicious traffic detection systems which relied on supervised machine learning approach need a considerable number of benign and malware traffic samples to train the machine learning models. Moreover, in the cases of zero-day attacks, only a few labeled traffic samples are accessible for analysis. To deal with this, we propose a few-shot malicious IoT traffic detection system with a prototypical graph neural network. The proposed approach does not require prior knowledge of network payload binaries or network traffic signatures. The model is trained on labeled traffic data and tested to evaluate its ability to detect new types of attacks when only a few labeled traffic samples are available. The proposed detection system first categorizes the network traffic as a bidirectional flow and visualizes the binary traffic flow as a color image. A neural network is then applied to the visualized traffic to extract important features. After that, using the proposed few-shot graph neural network approach, the model is trained on different few-shot tasks to generalize it to new unseen attacks. The proposed model is evaluated on a network traffic dataset consisting of benign traffic and traffic corresponding to six types of attacks. The results revealed that our proposed model achieved an F1 score of 0.91 and 0.94 in 5-shot and 10-shot classification, respectively, and outperformed the baseline models.
In machine learning, data augmentation (DA) is a technique for improving the generalization performance of models. In this paper, we mainly consider gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyze the situation where noisy copies are newly generated and injected into inputs at each epoch, i.e., the case of using on-line noisy copies. Therefore, this article can also be viewed as an analysis on a method using noise injection into a training process by DA. We considered the training process under three training situations which are the full-batch training under the sum of squared errors, and full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to the l2 regularization training for which variance of injected noise is important, whereas the number of copies is not. Moreover, we showed that DA with on-line copies apparently leads to an increase of learning rate in full-batch condition under the sum of squared errors and the mini-batch condition under the mean squared error. The apparent increase in learning rate and regularization effect can be attributed to the original input and additive noise in noisy copies, respectively. These results are confirmed in a numerical experiment in which we found that our result can be applied to usual off-line DA in an under-parameterization scenario and can not in an over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression can be qualitatively applied to neural networks.
Keita IMAIZUMI Koichi ICHIGE Tatsuya NAGAO Takahiro HAYASHI
In this paper, we propose a method for predicting radio wave propagation using a correlation graph convolutional neural network (C-Graph CNN). We examine what kind of parameters are suitable to be used as system parameters in C-Graph CNN. Performance of the proposed method is evaluated by the path loss estimation accuracy and the computational cost through simulation.
Daiki HIRATA Norikazu TAKAHASHI
Convolutional Neural Networks (CNNs) have shown remarkable performance in image recognition tasks. In this letter, we propose a new CNN model called the EnsNet which is composed of one base CNN and multiple Fully Connected SubNetworks (FCSNs). In this model, the set of feature maps generated by the last convolutional layer in the base CNN is divided along channels into disjoint subsets, and these subsets are assigned to the FCSNs. Each of the FCSNs is trained independent of others so that it can predict the class label of each feature map in the subset assigned to it. The output of the overall model is determined by majority vote of the base CNN and the FCSNs. Experimental results using the MNIST, Fashion-MNIST and CIFAR-10 datasets show that the proposed approach further improves the performance of CNNs. In particular, an EnsNet achieves a state-of-the-art error rate of 0.16% on MNIST.
Single image deraining is an ill-posed problem which also has been a long-standing issue. In past few years, convolutional neural network (CNN) methods almost dominated the computer vision and achieved considerable success in image deraining. Recently the Swin Transformer-based model also showed impressive performance, even surpassed the CNN-based methods and became the state-of-the-art on high-level vision tasks. Therefore, we attempt to introduce Swin Transformer to deraining tasks. In this paper, we propose a deraining model with two sub-networks. The first sub-network includes two branches. Rain Recognition Network is a Unet with the Swin Transformer layer, which works as preliminarily restoring the background especially for the location where rain streaks appear. Detail Complement Network can extract the background detail beneath the rain streak. The second sub-network which called Refine-Unet utilizes the output of the previous one to further restore the image. Through experiments, our network achieves improvements on single image deraining compared with the previous Transformer research.
Kazuhisa FUJIMOTO Masanori TAKADA
Neuromorphic computing with a spiking neural network (SNN) is expected to provide a complement or alternative to deep learning in the future. The challenge is to develop optimal SNN models, algorithms, and engineering technologies for real use cases. As a potential use cases for neuromorphic computing, we have investigated a person monitoring and worker support with a video surveillance system, given its status as a proven deep neural network (DNN) use case. In the future, to increase the number of cameras in such a system, we will need a scalable approach that embeds only a few neuromorphic devices in a camera. Specifically, this will require a shallow SNN model that can be implemented in a few neuromorphic devices while providing a high recognition accuracy comparable to a DNN with the same configuration. A shallow SNN was built by converting ResNet, a proven DNN for image recognition, and a new configuration of the shallow SNN model was developed to improve its accuracy. The proposed shallow SNN model was evaluated with a few neuromorphic devices, and it achieved a recognition accuracy of more than 80% with about 1/130 less energy consumption than that of a GPU with the same configuration of DNN as that of SNN.
Wenrong XIAO Yong CHEN Suqin GUO Kun CHEN
An attention residual network with triple feature as input is proposed to predict the remaining useful life (RUL) of bearings. First, the channel attention and spatial attention are connected in series into the residual connection of the residual neural network to obtain a new attention residual module, so that the newly constructed deep learning network can better pay attention to the weak changes of the bearing state. Secondly, the “triple feature” is used as the input of the attention residual network, so that the deep learning network can better grasp the change trend of bearing running state, and better realize the prediction of the RUL of bearing. Finally, The method is verified by a set of experimental data. The results show the method is simple and effective, has high prediction accuracy, and reduces manual intervention in RUL prediction.
Longjiao ZHAO Yu WANG Jien KATO Yoshiharu ISHIKAWA
Convolutional Neural Networks (CNNs) have recently demonstrated outstanding performance in image retrieval tasks. Local convolutional features extracted by CNNs, in particular, show exceptional capability in discrimination. Recent research in this field has concentrated on pooling methods that incorporate local features into global features and assess the global similarity of two images. However, the pooling methods sacrifice the image's local region information and spatial relationships, which are precisely known as the keys to the robustness against occlusion and viewpoint changes. In this paper, instead of pooling methods, we propose an alternative method based on local similarity, determined by directly using local convolutional features. Specifically, we first define three forms of local similarity tensors (LSTs), which take into account information about local regions as well as spatial relationships between them. We then construct a similarity CNN model (SCNN) based on LSTs to assess the similarity between the query and gallery images. The ideal configuration of our method is sought through thorough experiments from three perspectives: local region size, local region content, and spatial relationships between local regions. The experimental results on a modified open dataset (where query images are limited to occluded ones) confirm that the proposed method outperforms the pooling methods because of robustness enhancement. Furthermore, testing on three public retrieval datasets shows that combining LSTs with conventional pooling methods achieves the best results.
He LI Yutaro IWAMOTO Xianhua HAN Lanfen LIN Akira FURUKAWA Shuzo KANASAKI Yen-Wei CHEN
Convolutional neural networks (CNNs) have become popular in medical image segmentation. The widely used deep CNNs are customized to extract multiple representative features for two-dimensional (2D) data, generally called 2D networks. However, 2D networks are inefficient in extracting three-dimensional (3D) spatial features from volumetric images. Although most 2D segmentation networks can be extended to 3D networks, the naively extended 3D methods are resource-intensive. In this paper, we propose an efficient and accurate network for fully automatic 3D segmentation. Specifically, we designed a 3D multiple-contextual extractor to capture rich global contextual dependencies from different feature levels. Then we leveraged an ROI-estimation strategy to crop the ROI bounding box. Meanwhile, we used a 3D ROI-attention module to improve the accuracy of in-region segmentation in the decoder path. Moreover, we used a hybrid Dice loss function to address the issues of class imbalance and blurry contour in medical images. By incorporating the above strategies, we realized a practical end-to-end 3D medical image segmentation with high efficiency and accuracy. To validate the 3D segmentation performance of our proposed method, we conducted extensive experiments on two datasets and demonstrated favorable results over the state-of-the-art methods.
Hiroyuki NOZAKA Kosuke KAMATA Kazufumi YAMAGATA
The data augmentation method is known as a helpful technique to generate a dataset with a large number of images from one with a small number of images for supervised training in deep learning. However, a low validity augmentation method for image recognition was reported in a recent study on artificial intelligence (AI). This study aimed to clarify the optimal data augmentation method in deep learning model generation for the recognition of white blood cells (WBCs). Study Design: We conducted three different data augmentation methods (rotation, scaling, and distortion) on original WBC images, with each AI model for WBC recognition generated by supervised training. The subjects of the clinical assessment were 51 healthy persons. Thin-layer blood smears were prepared from peripheral blood and subjected to May-Grünwald-Giemsa staining. Results: The only significantly effective technique among the AI models for WBC recognition was data augmentation with rotation. By contrast, the effectiveness of both image distortion and image scaling was poor, and improved accuracy was limited to a specific WBC subcategory. Conclusion: Although data augmentation methods are often used for achieving high accuracy in AI generation with supervised training, we consider that it is necessary to select the optimal data augmentation method for medical AI generation based on the characteristics of medical images.
Epileptic seizure prediction is an important research topic in the clinical epilepsy treatment, which can provide opportunities to take precautionary measures for epilepsy patients and medical staff. EEG is an commonly used tool for studying brain activity, which records the electrical discharge of brain. Many studies based on machine learning algorithms have been proposed to solve the task using EEG signal. In this study, we propose a novel seizure prediction models based on convolutional neural networks and scalp EEG for a binary classification between preictal and interictal states. The short-time Fourier transform has been used to translate raw EEG signals into STFT sepctrums, which is applied as input of the models. The fusion features have been obtained through the side-output constructions and used to train and test our models. The test results show that our models can achieve comparable results in both sensitivity and FPR upon fusion features. The proposed patient-specific model can be used in seizure prediction system for EEG classification.
The finger-vein-based deep neural network authentication system has been applied widely in real scenarios, such as countries' banking and entrance guard systems. However, to ensure performance, the deep neural network should train many parameters, which needs lots of time and computing resources. This paper proposes a method that introduces artificial features with prior knowledge into the convolution layer. First, it designs a multi-direction pattern base on the traditional local binary pattern, which extracts general spatial information and also reduces the spatial dimension. Then, establishes a sample effective deep convolutional neural network via combination with convolution, with the ability to extract deeper finger vein features. Finally, trains the model with a composite loss function to increase the inter-class distance and reduce the intra-class distance. Experiments show that the proposed methods achieve a good performance of higher stability and accuracy of finger vein recognition.
Wenxin DONG Jianxun ZHANG Shuqiu TAN Xinyue ZHANG
In the pork fat content detection task, traditional physical or chemical methods are strongly destructive, have substantial technical requirements and cannot achieve nondestructive detection without slaughtering. To solve these problems, we propose a novel, convenient and economical method for detecting the fat content of pig B-ultrasound images based on hybrid attention and multiscale fusion learning, which extracts and fuses shallow detail information and deep semantic information at multiple scales. First, a deep learning network is constructed to learn the salient features of fat images through a hybrid attention mechanism. Then, the information describing pork fat is extracted at multiple scales, and the detailed information expressed in the shallow layer and the semantic information expressed in the deep layer are fused later. Finally, a deep convolution network is used to predict the fat content compared with the real label. The experimental results show that the determination coefficient is greater than 0.95 on the 130 groups of pork B-ultrasound image data sets, which is 2.90, 6.10 and 5.13 percentage points higher than that of VGGNet, ResNet and DenseNet, respectively. It indicats that the model could effectively identify the B-ultrasound image of pigs and predict the fat content with high accuracy.
Rong FEI Yufan GUO Junhuai LI Bo HU Lu YANG
With the widespread use of indoor positioning technology, the need for high-precision positioning services is rising; nevertheless, there are several challenges, such as the difficulty of simulating the distribution of interior location data and the enormous inaccuracy of probability computation. As a result, this paper proposes three different neural network model comparisons for indoor location based on WiFi fingerprint - indoor location algorithm based on improved back propagation neural network model, RSSI indoor location algorithm based on neural network angle change, and RSSI indoor location algorithm based on depth neural network angle change - to raise accurately predict indoor location coordinates. Changing the action range of the activation function in the standard back-propagation neural network model achieves the goal of accurately predicting location coordinates. The revised back-propagation neural network model has strong stability and enhances indoor positioning accuracy based on experimental comparisons of loss rate (loss), accuracy rate (acc), and cumulative distribution function (CDF).
Xi ZHANG Yanan ZHANG Tao GAO Yong FANG Ting CHEN
The original single-shot multibox detector (SSD) algorithm has good detection accuracy and speed for regular object recognition. However, the SSD is not suitable for detecting small objects for two reasons: 1) the relationships among different feature layers with various scales are not considered, 2) the predicted results are solely determined by several independent feature layers. To enhance its detection capability for small objects, this study proposes an improved SSD-based algorithm called proportional channels' fusion SSD (PCF-SSD). Three enhancements are provided by this novel PCF-SSD algorithm. First, a fusion feature pyramid model is proposed by concatenating channels of certain key feature layers in a given proportion for object detection. Second, the default box sizes are adjusted properly for small object detection. Third, an improved loss function is suggested to train the above-proposed fusion model, which can further improve object detection performance. A series of experiments are conducted on the public database Pascal VOC to validate the PCF-SSD. On comparing with the original SSD algorithm, our algorithm improves the mean average precision and detection accuracy for small objects by 3.3% and 3.9%, respectively, with a detection speed of 40FPS. Furthermore, the proposed PCF-SSD can achieve a better balance of detection accuracy and efficiency than the original SSD algorithm, as demonstrated by a series of experimental results.
Xincheng CAO Bin YAO Binqiang CHEN Wangpeng HE Suqin GUO Kun CHEN
Tool condition monitoring is one of the core tasks of intelligent manufacturing in digital workshop. This paper presents an intelligent recognize method of tool condition based on deep learning. First, the industrial microphone is used to collect the acoustic signal during machining; then, a central fractal decomposition algorithm is proposed to extract sensitive information; finally, the multi-scale convolutional recurrent neural network is used for deep feature extraction and pattern recognition. The multi-process milling experiments proved that the proposed method is superior to the existing methods, and the recognition accuracy reached 88%.